Instructions

Below you will find several empty R code scripts and few places where a line starts with the word “Answer:”. Your task is to fill in the required code and answer the questions as stated.

Eggs Dataset

Today you will be working with a datasets of birds:

Here is a full data dictionary describing all of the variables

Notice that the last two variables are integer codes. They are stored as numbers but correspond to a category.

Starting plot

Create a scatter plot showing the mass of a male bird (x-axis) and the mass of an egg:

You should notice that the plot’s scale makes it hard to see the relationship between the two variables.

Changing the scale

Now add the layers scale_x_log10 and scale_y_log10

How would you now describe the relationship between the two variables (I just need one sentence here)?

Answer: It is a positive relationship; the larger the mass of the male bird, the larger the mass of the egg.

Parrots

Create a new dataset called parrots consisting of just those birds that are parrots (hint: use the type variable; double hint: look at the raw data for exactly how to format the filter query):

## # A tibble: 12 x 10
##               genus       species        name   type egg_mass male_mass
##               <chr>         <chr>       <chr>  <chr>    <dbl>     <dbl>
##  1      Aprosmictus erythropterus  Red-winged Parrot    11.50     134.7
##  2         Lathamus      discolor       Swift Parrot     5.95      64.7
##  3         Neophema   chrysostoma Blue-winged Parrot     4.20      45.7
##  4         Neophema    petrophila        Rock Parrot     4.85      53.0
##  5         Neophema     pulchella   Turquoise Parrot     3.90      42.7
##  6     Neopsephotus       bourkii    Bourke's Parrot     3.75      46.0
##  7        Pezoporus      wallicus      Ground Parrot     6.85      78.0
##  8        Polytelis    alexandrae Alexandra's Parrot     7.75      96.0
##  9        Polytelis   anthopeplus      Regent Parrot     9.40     174.7
## 10        Polytelis    swainsonii      Superb Parrot     8.10     153.1
## 11        Psephotus  haematonotus  Red-rumped Parrot     4.50      61.4
## 12 Purpureicephalus       spurius  Red-capped Parrot     7.15     117.3
## # ... with 4 more variables: mating_system <int>, display <int>,
## #   resource <int>, clutch_size <dbl>

Now add a layer to the previous plot (keeping the log scales) where the parrots are highlighted in the color “red”. To make them stand out, make the base layer have an alpha value of 0.15. Finally, add a text annotation describing to the reader that the red points are parrots.

Smoothing line

Now, we are going to add a best-fit line to the plot. We do this by adding geom_smooth(method = "lm") to the plot. Add this to the plot using the log-log scale, but without highlighting the parrots.

I think the best-fit is a bit to colorful and noisy. Fix it by changing the line to this instead: geom_smooth(method = "lm", color = "black", se = FALSE, linetype = "dashed", size = 0.5).

Does the best-fit match the visual pattern you saw between the size of a bird and the size of its eggs (again, one sentence is sufficent)?

Answer: Yes, the size of the male bird still correlates with the size of the egg.

Outliers

If you look at the plot, you’ll see one bird in particular who has a very large egg size given the mass of the bird itself. This is the the Red-tailed tropicbird (also, you can add pictures to Rmarkdown!):

The tropicbird as a male mass of 218.7g and an egg mass of 87.00g. Annotate this point on the graph and give a label for it:

Your turn

Construct one final graph of the data. You are free to use the other variables that we did not look at yet or to look at different classes of birds. For this graph (only), please add an appropriate title and annotations.

## # A tibble: 1 x 10
##   genus species   name  type egg_mass male_mass mating_system display
##   <chr>   <chr>  <chr> <chr>    <dbl>     <dbl>         <int>   <int>
## 1  Apus    apus Common Swift      3.5        39             2       3
## # ... with 2 more variables: resource <int>, clutch_size <dbl>